Generating Landscape Paintings with GANs

December 10th 2019

Deep Learning for Computer Vision

Author John Daciuk

Professor Peter Belhumeur

Introduction

Let's begin by way of analogy with a quote from Max Tegmark's book "Life 3.0":

"Computers are universal machines, their potential extends uniformly over a boundless expanse of tasks. Human potentials, on the other hand, are strong in areas long important for survival, but weak in things far removed. Imagine a “landscape of human competence,” having lowlands with labels like “arithmetic” and “rote memorization,” foothills like “theorem proving” and “chess playing,” and high mountain peaks labeled “locomotion,” “hand-eye coordination” and “social interaction.” Advancing computer performance is like water slowly flooding the landscape."

Screen%20Shot%202019-12-10%20at%201.47.11%20PM.png

In what follows, I will explore the lowlands of the "Art" peak. The goal is to get a sense of what's already possible and what may be possible in the future by leveraging GANs. In particular, I use GANs to generate landscape paintings.

Summary of Work

Data:

Collecting a massive amount of landscape paintings is challenging. Although WikiArt has over 20,000 landscape paintings, currently available scraping scripts are not reliable and Wikiart generally has a poor UI experience. Ultimately, I was able to download 15,000 Wikiart landscape paintings from a google drive linked to on a blog. I set out to then painstakingly supplement this data by way of scraping google images and cleaning the results by hand. This yielded another $\approx$ 12,000 images.

Network Architecture

After lots of failed experimentation, I decided against trying to write my GANs from scratch. Instead I found a clean DCGAN implementation online that worked very well for MNIST out of the box. My job, then, was to modify the shape and hyperparameters to my task. See comments above the generator code for further explanation of how I re-enginereed the network. I also used the training loop code that I found with the GAN, but made small modifications for technical reasons and for better presentation.

Training

For training I used a Tesla V-100 GPU on google cloud. I did all my own training from scratch, starting with random weights.

Results and Analysis

All the code to analyze my GANs and produce results is my own work. Using various architectures I was indeed able to create paintings as beautiful as the data. Although I'm not the first person to use such data to create landscape paintings, similar examples are limited, and I hope that this work can inspire other's who are interested.

Sources

Data:

https://github.com/rkjones4/GANGogh

Network Architecure and Training Loop

Keras DCGAN implementation ready for MNIST: https://github.com/eriklindernoren/Keras-GAN

Similar Projects

I did not look at any of the code or technicalities of similar projects, but they did serve as some confirmation that this project would be successful.

This is a similar project with more complicated architecture. They used only 64x64 images due to compute limitations. Results for landscapes look poor: https://towardsdatascience.com/gangogh-creating-art-with-gans-8d087d8f74a1

Here is who seems to be the king of Wiki Art GANs online: https://github.com/robbiebarrat/art-DCGAN

Papers

Rather than looking at cutting edge theory, I sought to gain a firm foundation with which to explore.

1) Original GAN paper by Ian Goodfellow: https://arxiv.org/abs/1406.2661

2) 2016 DCGAN paper by Alec Radford et al. that forms the theoretical basis for the network architecture I used. Through rigorous experiment develops improvements that stabilize DCGAN training, a term that they coined: https://arxiv.org/abs/1511.06434

Literature Review and Discussion of Theory

Generative models have applications across many domains, but suffer from many limitations and are hence a rich area of research. $ z $ Generative Adversarial Networks were invented in 2013 to bridge deep learning with generative models. GANs pit two neural networks against one another. The generator attempts to fabricate data that appears to come from the true distribution while the discriminator learns to better and better recognize the fakes. The essential idea is the following: once a discriminator has learned to pick out the fakes, the generator can use gradient descent to increase the loss function of the discriminator. In doing so the generator learns its own weakness and takes a learning step towards correction. Consider the key equation from Goodfellow's paper:

$$\min_{G} \max_{D} V(D,G) = \mathbf{E}_{x \sim p_{data}\ \ (x)} \big[ log\ D(x) \big] + \mathbf{E}_{z \sim p_{z}\ (z)} \big[ log (1 - D(G(z))) \big]$$

$z$ is the source the generator takes as input, typically a noise vector in the case of GANs. The discriminator seeks to maximize the right-hand side. The first component of the sum, $\mathbf{E}_{x \sim p_{data}\ \ (x)} \big[ log\ D(x) \big]$, is the expected value of the log of the discriminators prediction when it reads input from the true distribution. If the discriminator is optimal it would output 1.0 whenever fed true data, predicting it to be real, and thus maximizing this expectation. The second component of the sum, $ \mathbf{E}_{z \sim p_{z}\ (z)} \big[ log (1 - D(G(z))) \big] $, enforces that the discriminator also attempt to pick out the fakes. The minus sign in front of $ D(G(z)) $ means that the discriminator needs to predict 0.0 when presented with fakes to optimize. Since the generator aims to minimize these same components, it will need to work in opposition to the discriminator, pulling down it's predictions for true data and pulling up its predications on the fake data. It's worth noting that this equation says nothing about the form of D and G. They need not be neural networks but networks lend themselves very well to optimization which is why we use them.

$ $So to clarify, assuming $z$ is a noise vector stochastically generated, $G(z)$ is also stochastic and forms a probability distribution we'll call $ p_g $. It should be no surprise from the way we've framed the task that $ G $ minimizes the above equation when $ p_g = p_{data} $. $ D $ on the other hand maximizes the equation for a fixed $ G $ when $ D(x) = \frac{p_{data}\ \ (x)}{p_{data}\ \ (x) + p_g (x)} $, which is just to say that $D$ can't do better than judging the probability of data being real as the probability of that data coming from the true distribution over the probability of the data. Although intuitive, Goodfellow actually puts forward a detailed proof of these optimization points. Moreover Goodfellow derives a guarantee that an idealized gradient descent algorithm can make $ p_g $ converge to $p_{data}$ but admits that the guarantee doesn't hold for multilayer perceptrons. Deep nets are limited in the scope of distributions they can produce, yet still in practice work fantastically well.

Evaluating GANs is quite difficult and somewhat ambiguous. We don't have a simple metric like loss because the two networks are in opposition to each other's loss. The situation is analogous to two siblings that play chess with only one another for years. It's hard for either of them to know how well they're playing because they may still only be winning half the games! Working with CI-FAR-10 and MNIST, Goodfellow actually predicts the probability of the test set given $ p_g $. This does not scale well to higher dimensional data because of high variance and often evaluating GANs reduces to visually inspecting the results.

One principle challenege that Goodfellow identifies is that GANs are inherently unstable to train. There's a lot of moving parts and if either network learns too quickly it can be impossible for the other to catch up. For example, the discriminator may be so firm in its predictions that the generator gradients may vanish. It is crucial that the two networks maintain balance and keep that balance for an extended period of time. This is where Radford's 2016 DCGAN paper comes to the rescue. Through much exertion, Radford et al discovered many ways to stabilize training. Leveraging this knowledge makes training GANs today a far more pleasant experience than it was in 2014 or even 2016. Batch norm helps avoid modal collapse, the phenomenon during which the generator produces the same image it thinks will trick the discriminator regardless of $z$. Modal collapse would prevent us from creating a variety of compelling paintings. Batch norm normalizes the inputs to hidden layers to between -1 and 1. With a more consistent input range hidden layers can learn an optimal mapping more quickly because they don't need to adapt as much to the magnitudes of other layers. Radford also discovered that giving Adam a momentum of .5 instead of the default .9 and not using pooling encourage more consistent learning. With less momentum networks will have less variance in their effective learning rates and potentially find a smoother path to reducing loss while still not getting completely stuck when gradients are low.

Code

In [1]:
import tensorflow as tf
import keras as keras
from keras.datasets import mnist
from keras.layers import Input, Dense, Reshape, Flatten, Dropout
from keras.layers import BatchNormalization, Activation, ZeroPadding2D, LeakyReLU, UpSampling2D, Conv2D
from keras.models import Sequential, Model
from keras.optimizers import Adam
from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model
from keras import backend as K
import matplotlib.pyplot as plt
import sys
import numpy as np
from PIL import ImageFile

plt.rcParams["font.size"] = 15 
plt.rcParams["axes.titlesize"] = 20 
Using TensorFlow backend.

Load Data

In [205]:
# I load all the data into RAM because I found the keras data generator to be a bottleneck on Google Cloud

ImageFile.LOAD_TRUNCATED_IMAGES = True

# I can't reasonably do many manipulations to augment the data.  
# I do flip and slight zoom and brightness.  I make wikiArt brighter than the google scrapes because wikiArt
# tends to be darker.  I use more zoom for the google scrapes because some of them could us more cropping.
def array_from_generator(data_path, target_size, zoom=.2, brightness=[1.0, 1.01], num_images=30000, batch_size=50):
    data_generator = ImageDataGenerator(rescale=1.0/255, 
                                        horizontal_flip=True, 
                                        zoom_range=zoom, 
                                        fill_mode="reflect",
                                        brightness_range=brightness)
    
    flow_generator = data_generator.flow_from_directory(data_path, 
                                                        target_size=target_size,
                                                        batch_size=batch_size,
                                                        class_mode="input")
    
    # now fill a np array to put images in RAM
    images = []
    batch_count = 0
    for batch in flow_generator:
        if batch_count % 50 == 0:
            print("Processing batch {}".format(batch_count))
        for img in batch[0]:
            images.append(img)
        batch_count += 1
        if batch_count == num_images / batch_size:
            break

    return np.array(images)

# I grab more images from wikiArt because it's a higher quality dataset.  
# I resize everything to 128 x 128.  This is actually relatively large for GAN training, but obviously
# much desirable to 64 x 64.  Although I found I could manage 128 x 128, training was too slow to try anything bigger.
google_landscapes = array_from_generator(data_path="bucket_data/bucket-1984/google_data_cleaned", target_size=(128,128), zoom=.3, num_images=17000)
wiki_landscapes = array_from_generator(data_path="bucket_data/bucket-1984/data", target_size=(128,128), brightness=[1.0, 1.09], num_images=28000)
Found 11663 images belonging to 1 classes.
Processing batch 0
Processing batch 50
Processing batch 100
Processing batch 150
Processing batch 200
Processing batch 250
Processing batch 300
Found 14999 images belonging to 1 classes.
Processing batch 0
Processing batch 50
Processing batch 100
Processing batch 150
Processing batch 200
Processing batch 250
Processing batch 300
Processing batch 350
Processing batch 400
Processing batch 450
Processing batch 500
Processing batch 550

Inspect Wiki Art Landscapes

In [13]:
# I show many images throughout, so it's nice to have a function that takes care of boiler-plate code
def show_image(ax, image, title=None, fontsize=12):
    ax.imshow(image)
    ax.axis("off")
    if title is not None:
        ax.set_title(title, fontsize=fontsize)
        
# I use this function often throughout.  It simply shows random images from some image array.
def plot_random_images(images, rows, cols, title, show_idx=False, y=.93, scale=1):
    fig, axs = plt.subplots(rows, cols, figsize=(int(12*scale), int(12 * rows / 5 * scale)))
    for i in range(rows):
        for j in range(cols):
            idx = np.random.randint(0, len(images))
            axs[i,j].imshow(images[idx])
            axs[i,j].axis('off')
            if show_idx:
                axs[i,j].set_title("idx = {}".format(idx))

    fig.suptitle(title, y=y, fontsize=24)

print("Shape of wiki landscapes: {}\n".format(wiki_landscapes.shape))
plot_random_images(wiki_landscapes, 5, 5, "Random Sample of Wiki Landscapes")
Shape of wiki landscapes: (28000, 128, 128, 3)

Comments:

I've taken 15,000 wiki art landscapes and brought it up to 28,000 with data augmentation. As we can see, some of these images don't actually look like landscapes, but for the most part they are reasonably good. I hope for the sake of future GAN artists that datasets like this will grow in size and quality. See https://www.wikiart.org.

Scraped Google Images Dataset

To find a large quantity of additional data I have scraped google images. In order to increase the variety of results I have used many keywords in addition to, of course, "landscape paintings". These keywords were followed by either "landscape paintings" or "paintings to complete the search.

Search Keywords:

Fantasy, Digital, Magical, Simple, Beautiful, Classic, Modern, Autumn, Spring, Winter, Summer, European, American, Asian, Russian, Spanish, Austrian, French, Colorado, California, Alaska, 18th century, 19th century, Impressionist, Bucolic, Countryside, Yellowstone, Yosemite, Mountain, Woods, Waterfall, Watercolor, Valley, Tree, Scenery, Rolling Hill, Rockies, River, Nature, National Park, Meadow, Land, Forest, Canyon.

One drawback to google image scraping is the high percentage of undesirable results. While my initial scrapes returned over 16,000 images, I found about 25% of these to be problematic. Many images had large borders, frames, extraneous background context, text, occlusion or were not even landscape paintings; other images were marketing gimmicks. I was initially tempted to filter out the poor images with a discriminator model; however, as we'll see even many great landscapes from google are very different from the wikiArt dataset. Ultimately I cleaned the dataset myself, erasing over 4,000 images. A random sample of images below displays the quality and variation of the filtered google dataset.

Inspect Google Scraped Landscapes

In [5]:
print("Shape of google_scapes: {}\n".format(google_landscapes.shape))
plot_random_images(google_landscapes, 5, 5, "Random Sample of Google Image Landscapes")
Shape of google_scapes: (16963, 128, 128, 3)

Create Combined Dataset

In [207]:
# Some images were rejected by the data loader, thus we don't have quite an even 45,000
landscapes = np.vstack((wiki_landscapes, google_landscapes))
print("Shape of whole dataset: {}".format(landscapes.shape))
Shape of whole dataset: (44962, 128, 128, 3)

Exploratory Data Analysis: Discriminating between Wiki Art and Google Images

It's interesting to see the extent to which Wiki Art differs from the Google scraped images. One way to find out is to create a discriminator network and see how easily it can manage. I don't actually use a validation set here because the purpose is just to get a quick idea as well as a visual of the differences.

In [72]:
# create labels, 1 for Wiki Art, 0 for Google scrapes
y_1 = np.ones(wiki_landscapes.shape[0])
y_2 = np.zeros(google_landscapes.shape[0])
y = np.append(y_train1, y_train2)
y.shape
Out[72]:
(49974,)
In [ ]:
# Here I use the same discriminator architecture that I'll use for my GANs
# I did not write this code, but I did make modifications to the input shape
# and generalized the notion of network complexity to easily allow me to try different numbers of filters
# Since we need to go from an image to a 1 dimensional prediction the number of filters increases
# as we go to deeper layers, while the surface dimensions of the convolutions decrease.
# Note that I multiply the depth in each layer, this is increasing the number of filters.
# See the following for the code I expanded on: https://github.com/eriklindernoren/Keras-GAN

depth = 256
gw_discriminator = Sequential()
gw_discriminator.add(Conv2D(depth, kernel_size=3, strides=2, input_shape=(128,128,3), padding="same"))
gw_discriminator.add(LeakyReLU(alpha=0.2))
gw_discriminator.add(Dropout(0.25))
gw_discriminator.add(Conv2D(depth * 2, kernel_size=3, strides=2, padding="same"))
gw_discriminator.add(ZeroPadding2D(padding=((0,1),(0,1))))
gw_discriminator.add(BatchNormalization(momentum=0.8))
gw_discriminator.add(LeakyReLU(alpha=0.2))
gw_discriminator.add(Dropout(0.25))
gw_discriminator.add(Conv2D(depth * 4, kernel_size=3, strides=2, padding="same"))
gw_discriminator.add(BatchNormalization(momentum=0.8))
gw_discriminator.add(LeakyReLU(alpha=0.2))
gw_discriminator.add(Dropout(0.25))
gw_discriminator.add(Conv2D(depth * 8, kernel_size=3, strides=1, padding="same"))
gw_discriminator.add(BatchNormalization(momentum=0.8))
gw_discriminator.add(LeakyReLU(alpha=0.2))
gw_discriminator.add(Dropout(0.25))
gw_discriminator.add(Flatten())
gw_discriminator.add(Dense(1, activation='sigmoid'))
        
gw_discriminator.compile(loss='binary_crossentropy',
                         optimizer=Adam(1e-4, 0.9),
                         metrics=['accuracy'])

hist = gw_discriminator.fit(landscapes, y, epochs=15)
In [83]:
metrics = gw_discriminator.evaluate(landscapes, y)
print("Training accuracy discriminating between wiki art and google images is: {:.3f}".format(metrics[1]))
49974/49974 [==============================] - 58s 1ms/sample - loss: 0.4639 - acc: 0.8892
Training accuracy discriminating between wiki art and google images is: 0.889

Comments:

We're easily getting close to 90% training accuracy in just 1 epoch. As expected because the two datasets comes from two totally different distributions. It should be helpful for final results to show the GAN such a variety of landscapes. Let's go ahead and look at the google images judged most like Wiki Art and those least like Wiki Art.

In [100]:
# get discriminator predictions
preds = gw_discriminator.predict(google_landscapes)
preds = preds.reshape(-1)
In [101]:
sorted_preds = np.argsort(preds)  # sort predictions
In [113]:
# show images that correspond with the lowest predictions
fig, ax = plt.subplots(5,5, figsize=(15,15))
for row in range(5):
    for col in range(5):
        i = np.random.randint(500)
        idx = sorted_preds[i]
        show_image(ax[row, col], google_landscapes[idx], "Pred={}".format(preds[idx]))
        
fig.suptitle("Google Images Least Like Wiki Art", fontsize=22, y=.95);
In [111]:
# Now show the google images that the discriminator thinks are most like Wiki Art
fig, ax = plt.subplots(5,5, figsize=(15,15))

for row in range(5):
    for col in range(5):
        i = np.random.randint(500)
        idx = sorted_preds[-i]
        show_image(ax[row, col], google_landscapes[idx], "Pred={:.3f}".format(preds[idx]))
        
fig.suptitle("Google Images Most Like Wiki Art", fontsize=22, y=.95);

Comments:

We can see that the google scrapes least like Wiki Art tend to have neon colors and are more playful. Some of the google images lack artistic substance and likely aim to attract the attention of shoppers browsing the internet. This is part of the reason why I used fewer google images in my dataset.

Build GAN

Starting with the generator

In [38]:
# Here we see the architecture from Radford's 2016 DCGAN paper
# Although Radford's first conv layer has 1024 filters, the most I tried was 256 due to slow training and
# memory issues with my google cloud instance.  The complexity parameter is the filters in the first layer.
# Since we need to go from a noise vector to an image, we use upsampling.  As the surface dimensions increase
# we decrease the number of filters; thus we get this appealing pyramid look to our generator as described in the paper.
# A latent dimension of 100 is fairly standard, so I stuck with it.
# I experimented with the kernels and generally found that 5 or 7 is best for the shape of my data.

# See the following for the generator code I expanded on: https://github.com/eriklindernoren/Keras-GAN

def build_generator(img_rows=128, latent_dim=100, channels=3, complexity=256, upsample_layers=2, kernel=5, summary=False):
        model = Sequential()
        L1_size = int(img_rows / (2 ** upsample_layers))

        model.add(Dense(complexity * L1_size ** 2, activation="relu", input_dim=latent_dim))
        # reshape the noise vector into what can already be interpreted as a small image with many channels
        model.add(Reshape((L1_size, L1_size, complexity)))
        model.add(UpSampling2D())
        
        # this loop allows for arbitrary numbers of Conv layers with the same number of filters for simplicity
        for _ in range(upsample_layers - 1):
            model.add(Conv2D(int(0.5 * complexity), kernel_size=kernel, padding="same"))
            model.add(BatchNormalization(momentum=0.8))
            model.add(Activation("relu"))
            model.add(UpSampling2D())
        
        # again now reduce the number of filters.  Upsampling is bringing us to the larger surface dimensions we need
        model.add(Conv2D(int(0.25 * complexity), kernel_size=kernel, padding="same"))
        model.add(BatchNormalization(momentum=0.8))
        model.add(Activation("relu"))
        
        model.add(Conv2D(channels, kernel_size=kernel, padding="same"))
        model.add(Activation("tanh"))  # tanh because the discriminator will work better with -1 to 1 inputs

        if summary:
            model.summary()

        noise = Input(shape=(latent_dim,))
        img = model(noise)

        return Model(noise, img)

Build Discriminator

In [39]:
# The same discriminator I used to discriminate between Wiki Art and Google scrapes
# I experimented less with the discriminator than the generator since it has the easier job
# Note the increasing filters as we move deeper down the network.

# See the following for the code I expanded on: https://github.com/eriklindernoren/Keras-GAN

def build_discriminator(img_shape=(128,128,3), complexity=128, summary=False):

        model = Sequential()
        
        model.add(Conv2D(complexity, kernel_size=3, strides=2, input_shape=img_shape, padding="same"))
        model.add(LeakyReLU(alpha=0.2))  # LeakyReLU is another often prescribed tip for GANs
        model.add(Dropout(0.25))
        
        model.add(Conv2D(complexity * 2, kernel_size=3, strides=2, padding="same"))
        model.add(ZeroPadding2D(padding=((0,1),(0,1))))
        model.add(BatchNormalization(momentum=0.8))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.25))
        
        model.add(Conv2D(complexity * 4, kernel_size=3, strides=2, padding="same"))
        model.add(BatchNormalization(momentum=0.8))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.25))
        
        model.add(Conv2D(complexity * 8, kernel_size=3, strides=1, padding="same"))
        model.add(BatchNormalization(momentum=0.8))
        model.add(LeakyReLU(alpha=0.2))
        model.add(Dropout(0.25))
        model.add(Flatten())
        model.add(Dense(1, activation='sigmoid'))

        if summary:
            model.summary()

        img = Input(shape=img_shape)
        validity = model(img)

        return Model(img, validity)
In [ ]:
# I used the suggested hyperparams from my sources, but decreased the discriminator learning rate to help stabilize training
DIS_OPTIMIZER = Adam(1e-4 * 1.5, 0.5)
GEN_OPTIMIZER = Adam(1e-4 * 2, 0.5)
LATENT_DIM = 100

Discriminator Summary Example

In [41]:
discriminator = build_discriminator(summary=True)
discriminator.compile(loss='binary_crossentropy', optimizer=DIS_OPTIMIZER, metrics=['accuracy'])
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_8 (Conv2D)            (None, 64, 64, 128)       3584      
_________________________________________________________________
leaky_re_lu_5 (LeakyReLU)    (None, 64, 64, 128)       0         
_________________________________________________________________
dropout_5 (Dropout)          (None, 64, 64, 128)       0         
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 32, 32, 256)       295168    
_________________________________________________________________
zero_padding2d_2 (ZeroPaddin (None, 33, 33, 256)       0         
_________________________________________________________________
batch_normalization_6 (Batch (None, 33, 33, 256)       1024      
_________________________________________________________________
leaky_re_lu_6 (LeakyReLU)    (None, 33, 33, 256)       0         
_________________________________________________________________
dropout_6 (Dropout)          (None, 33, 33, 256)       0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 17, 17, 512)       1180160   
_________________________________________________________________
batch_normalization_7 (Batch (None, 17, 17, 512)       2048      
_________________________________________________________________
leaky_re_lu_7 (LeakyReLU)    (None, 17, 17, 512)       0         
_________________________________________________________________
dropout_7 (Dropout)          (None, 17, 17, 512)       0         
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 17, 17, 1024)      4719616   
_________________________________________________________________
batch_normalization_8 (Batch (None, 17, 17, 1024)      4096      
_________________________________________________________________
leaky_re_lu_8 (LeakyReLU)    (None, 17, 17, 1024)      0         
_________________________________________________________________
dropout_8 (Dropout)          (None, 17, 17, 1024)      0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 295936)            0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 295937    
=================================================================
Total params: 6,501,633
Trainable params: 6,498,049
Non-trainable params: 3,584
_________________________________________________________________

Generator Summary Example

In [42]:
generator = build_generator(complexity=256, upsample_layers=2, summary=True)
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_4 (Dense)              (None, 262144)            26476544  
_________________________________________________________________
reshape_2 (Reshape)          (None, 32, 32, 256)       0         
_________________________________________________________________
up_sampling2d_3 (UpSampling2 (None, 64, 64, 256)       0         
_________________________________________________________________
conv2d_12 (Conv2D)           (None, 64, 64, 128)       819328    
_________________________________________________________________
batch_normalization_9 (Batch (None, 64, 64, 128)       512       
_________________________________________________________________
activation_4 (Activation)    (None, 64, 64, 128)       0         
_________________________________________________________________
up_sampling2d_4 (UpSampling2 (None, 128, 128, 128)     0         
_________________________________________________________________
conv2d_13 (Conv2D)           (None, 128, 128, 64)      204864    
_________________________________________________________________
batch_normalization_10 (Batc (None, 128, 128, 64)      256       
_________________________________________________________________
activation_5 (Activation)    (None, 128, 128, 64)      0         
_________________________________________________________________
conv2d_14 (Conv2D)           (None, 128, 128, 3)       4803      
_________________________________________________________________
activation_6 (Activation)    (None, 128, 128, 3)       0         
=================================================================
Total params: 27,506,307
Trainable params: 27,505,923
Non-trainable params: 384
_________________________________________________________________

Comments:

The discriminator is flattened in one of the last layers, so how exactly the dimensions work out through the network is less critical than it is for the generator. Fortunately, working out the generator dimensions is not as hard as it may first appear. Because we use "same" padding, we can rely on the upsampling to double the image surface dimensions at each application! For example, if we want to get to 128 x 128, we can start with 32 x 32 and upsample twice. My build generator function takes this all into account and can build a network for any arbitrary number of first layer filters, upsample layers and kernel size. This made it much easier to experiment with different parameter values than having to work out the dimensions again and again.

Combine Network

In [98]:
# To build the combined network, we hook the discriminator up to the end of the generator.
# See the following for the code I expanded on: https://github.com/eriklindernoren/Keras-GAN

def build_combined(gen, dis):
    z = Input(shape=(LATENT_DIM,))
    img = gen(z)
    dis.trainable = False
    valid = dis(img)
    combined = Model(z, valid)
    combined.compile(loss='binary_crossentropy', optimizer=GEN_OPTIMIZER)
    return combined

# Convenient function for building the entire GAN with the parameters I'm interested to test
def build_gan(gen_complexity, gen_upsample_layers, kernel):
    discriminator = build_discriminator()
    discriminator.compile(loss='binary_crossentropy', optimizer=DIS_OPTIMIZER, metrics=['accuracy'])
    generator = build_generator(complexity=gen_complexity, upsample_layers=gen_upsample_layers, kernel=kernel)
    combined = build_combined(generator, discriminator)
    return generator, discriminator, combined

Training Loop

In [99]:
# Code to plot images while training.  We can not tell how training is going just by looking at loss as discussed.
# See the following for the code I expanded on: https://github.com/eriklindernoren/Keras-GAN

def plot_imgs(generator, epoch):
        r, c = 1, 5
        noise = np.random.normal(0, 1, (r * c, latent_dim))
        gen_imgs = generator.predict(noise)

        # Rescale images 0 - 1
        gen_imgs = 0.5 * gen_imgs + 0.5

        fig, axs = plt.subplots(r, c, figsize=(14,4))
        cnt = 0
        for j in range(c):
            axs[j].imshow(gen_imgs[cnt])
            axs[j].axis('off')
            cnt += 1
        plt.show()
In [101]:
# See the following for the code I expanded on: https://github.com/eriklindernoren/Keras-GAN

def train(real_images, generator, discriminator, combined, epochs, save_dir, batch_size=64, fuzzy=False):
        # keep track of losses if we want to plot and see how the networks are converging
        d_losses = []
        g_losses = []
        d_accs = []
        
        # Rescale from 0,1 to -1,1.  Typically better for NN input.
        X_train = real_images * 2 - 1

        # I added the ability to do fuzzy truth labels for the real images
        # This is a common tip that helped to keep the discriminator from getting too strong too quickly
        # Ultimately I trained alternating this option to help actively manage training stability
        if fuzzy:
            valid = np.clip(np.random.normal(.92, .05, (batch_size, 1)), 0.0, 1.0)
        else:
            valid = np.ones((batch_size, 1))
        
        fake = np.zeros((batch_size, 1))

        for epoch in range(1, epochs + 1):
            num_batches = int(len(X_train) / batch_size)
            for batch in range(num_batches):

                # ---------------------
                #  Train Discriminator
                # ---------------------

                # Select a random batch of images
                idx = np.random.randint(0, X_train.shape[0], batch_size)
                imgs = X_train[idx]

                # Sample noise and generate a batch of new images
                noise = np.random.normal(0, 1, (batch_size, latent_dim))
                gen_imgs = generator.predict(noise)

                # Train the discriminator (real classified as ones or fuzzy and generated as zeros)
                d_loss_real = discriminator.train_on_batch(imgs, valid)
                d_loss_fake = discriminator.train_on_batch(gen_imgs, fake)
                d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)

                # ---------------------
                #  Train Generator
                # ---------------------

                # Train the generator (wants discriminator to mistake images as real)
                g_loss = combined.train_on_batch(noise, valid)

                # Save the progress
                d_losses.append(d_loss[0])
                g_losses.append(g_loss)
                d_accs.append(100 * d_loss[1])
                    
            # At some regular interval I print the images to monitor training.
            # Also, I save the models.  It's not unlikely that an earlier epoch will result in a
            # better model than the last epoch.
            if epoch % 1 == 0:
                print ("Epoch: %d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, d_loss[0], 100*d_loss[1], g_loss))
                plot_imgs(generator, epoch)
                generator.save("models/" + save_dir + "/gen.{}.h5".format(epoch))
                discriminator.save("models/" + save_dir + "/dis.{}.h5".format(epoch))
                combined.save("models/" + save_dir + "/comb.{}.h5".format(epoch))
        
        return d_losses, g_losses, d_accs

Training Results

In [5]:
# The code from here on out is again all my own
# Model training for GANs is a verbose process because we like to inspect the images as they are generated
# It's also not very illuminating to see loss or accuracy while training
# For the sake of presentation I'll pull some of the better models I trained from disk
# Here we just organize all the paths.  The numbers that act as dict keys are training epochs

TRAINED_MODEL_NAMES = {"3-layer_256-complexity_5-kernel_fuzzy": {5: "models/2_256_fuzzy/gen.5.h5",
                                                                 10: "models/2_256_fuzzy/gen.10.h5",
                                                                 15: "models/2_256_fuzzy/gen.15.h5",
                                                                 25: "models/2_256_fuzzy/gen.25.h5",
                                                                 50: "models/2_256_fuzzy/gen.50.h5",
                                                                 70: "models/2_256_fuzzy2/gen.20.h5",
                                                                 100: "models/2_256_fuzzy3/gen.30.h5",
                                                                 130: "models/2_256_fuzzy3/gen.60.h5",
                                                                 170: "models/2_256_fuzzy3/gen.100.h5",
                                                                 200: "models/2_256_fuzzy4/gen.30.h5"},
              
               "3-layer_128-complexity_3-kernel": {5: "models/2_128/gen.5.h5",
                                                   10: "models/2_128/gen.10.h5",
                                                   15: "models/2_128_r2/gen.5.h5",
                                                   25: "models/2_128_r2/gen.15.h5",
                                                   50: "models/2_128_r3/gen.25.h5",
                                                   75: "models/2_128_r3/gen.50.h5"},
               
               "4-layer-256-complexity_7-kernel": {7: "models/3_256_7/gen.7.h5",
                                                   10: "models/3_256_7_r2/gen.3.h5",
                                                   15: "models/3_256_7_r3/gen.3.h5",
                                                   35: "models/3_256_7_r6/gen.8.h5",
                                                   10: "models/3_256_7_r2/gen.3.h5"}}

readable_names = {"3-layer_256-complexity_5-kernel_fuzzy": "3 Conv Layers, 256 First Layer Filters, 5 Kernel Size, Fuzzy Label Training",
                  "3-layer_128-complexity_3-kernel": "3 Conv Layers, 128 First Layer Filters, 3 Kernel Size",
                  "4-layer-256-complexity_7-kernel": "4 Conv Layers, 256 First Layer Filters, 7 Kernel Size" }
In [6]:
# Takes a generator model and creates and number of images, very useful!
def generate_images(generator, num_imgs):
    noise = np.random.normal(0, 1, (num_imgs, LATENT_DIM))
    gen_imgs = generator.predict(noise)  # generating images is just the generator predicting
    # Rescale images 0 - 1
    gen_imgs = 0.5 * gen_imgs + 0.5
    return gen_imgs, noise

# takes a model name as defined above and shows regular intervals of training results as I specified above
def show_model_training(model_name, tightfit=1, rows=1, width=30, titlesize=30):
    model_paths = TRAINED_MODEL_NAMES[model_name]
    epochs = list(model_paths.keys())
    epochs.sort()
    show_epochs = len(epochs)
    fig, ax = plt.subplots(rows, int(show_epochs/rows), figsize=(width, 5*rows))
    for i, epoch in enumerate(epochs):
        # generate image from model saved at that epoch
        path = model_paths[epoch]
        generator = load_model(path)
        img = generate_images(generator, num_imgs=1)[0][0]

        if rows==1:
            show_image(ax[i], img, "Epoch " + str(epoch), fontsize=23)
        if rows==2:
            show_image(ax[int(i/5), i%5], img, "Epoch " + str(epoch), fontsize=23)
            
        
    fig.suptitle(readable_names[model_name], fontsize=titlesize, y=tightfit)

Display of Training Process for 3 Different Generator Architectures

Each epoch typically took 10-15 minutes to train.

In [10]:
show_model_training("3-layer_128-complexity_3-kernel", tightfit=1.05)
In [13]:
show_model_training("4-layer-256-complexity_7-kernel", tightfit=1, width=16, titlesize=24)
In [14]:
show_model_training("3-layer_256-complexity_5-kernel_fuzzy", rows=2)

Comments:

Above we see randomly generated images during each networks training process. These were not handpicked. We can see that our best GAN is the one with 3 conv layers, an initial layer filter size of 256 and a kernel size of 5x5. At 15 epochs it starts to produce blurry landscape paintings, and at 70 layers results have sharpened substantially. We can see clouds, grass, trees, mountains, sunsets and flowers all emerge. The fact that I used fuzzy labels while training this model helped speed up early training and stabilize long term training. I'll refer here on to this model as the "best_gan".

By comparison the network with only 128 filters on the first layer and a 3x3 kernel produces a less convincing image at 75 epochs. Yet, even here we can see that the generator has learned to produce some interesting lighting effects, blue skies and horizons. This model will be refered to going forward as the "weak_gan".

The deeper generator with 4 conv layers and a 7x7 kernel took much longer to train and didn't seem to be going anywhere, but judging from it's image at 35 epochs it was doing better than I had thought! We may be seeing a sun, sky and grass. Moreover, that image has a dreamy and smooth feel to it that arn't quite captured in the other networks. This model will be refered to here on as the "deep_gan".

Here we only see 1 image from each model for each epoch. That's not really representative of what the models may be capable of, hence I've kep't all of these models to use for the rest of the project. It will be interesting to further inspect what we can get out of them and how they differ!

Declare Final Models from Each Architecture

In [16]:
def get_model(model_name, epoch):
    path = TRAINED_MODEL_NAMES[model_name][epoch]
    return load_model(path)

# The numbers after each model signify the number of epochs they trained for
# See above for the naming convention
best_gan70 = get_model("3-layer_256-complexity_5-kernel_fuzzy", 70)
best_gan100 = get_model("3-layer_256-complexity_5-kernel_fuzzy", 100)
best_gan170 = get_model("3-layer_256-complexity_5-kernel_fuzzy", 170)
best_gan200 = get_model("3-layer_256-complexity_5-kernel_fuzzy", 170)

# Only really one epoch worth keeping for these two
weak_gan75 = get_model("3-layer_128-complexity_3-kernel", 75)
deep_gan35 = get_model("4-layer-256-complexity_7-kernel", 35)

More Random Images from Each Model

Before I start cherry-picking images, let's get a sense of what our GANs typically produce.

In [18]:
# Produce 2,000 noise vectors and 2,000 corresponding images for each model we will study
# One beautiful thing about generators is how quickly they predict images
best_gan70_imgs, best_gan70_noise = generate_images(best_gan70, 2000)
best_gan100_imgs, best_gan100_noise = generate_images(best_gan100, 2000)
best_gan170_imgs, best_gan170_noise = generate_images(best_gan170, 2000)
best_gan200_imgs, best_gan200_noise = generate_images(best_gan200, 2000)

weak_gan75_imgs, weak_gan75_noise = generate_images(weak_gan75, 2000)
deep_gan35_imgs, deep_gan35_noise = generate_images(deep_gan35, 2000)
In [198]:
plot_random_images(best_gan70_imgs, 2, 5, "Random Sample of Best Gan After 70 Epochs", show_idx=False, y=.98, scale=1.2)
In [199]:
plot_random_images(best_gan100_imgs, 2, 5, "Random Sample of Best Gan After 100 Epochs", show_idx=False, y=.98, scale=1.2)
In [200]:
plot_random_images(best_gan170_imgs, 2, 5, "Random Sample of Best Gan After 170 Epochs", show_idx=False, y=.98, scale=1.2)
In [201]:
plot_random_images(best_gan200_imgs, 2, 5, "Random Sample of Best Gan After 200 Epochs", show_idx=False, y=.98, scale=1.2)
In [202]:
plot_random_images(weak_gan75_imgs, 2, 5, "Random Sample of Weak Gan After 75 Epochs", show_idx=False, y=.98, scale=1.2)
In [203]:
plot_random_images(deep_gan35_imgs, 2, 5, "Random Sample of Deep Gan After 35 Epochs", show_idx=False, y=.98, scale=1.2)

Comments:

No model consistently produces images that look like landscape paintings. However, it's not rare to find some gems. More hyper-parameter tuning and training time would likely lead to more consistent results. Using 64 x 64 images also makes consistent training much easier, but such tiny images are not exciting. Let's go ahead and cherry pick the results now to see what gems are lurking in the latent space.